-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For SPARK-527, Support spark-shell when running on YARN #868
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request. An admin will review this request soon. |
Hey Raymond, this is great. I tried this out quick and I have a couple of comments/questions. I had to use the repl shaded jar not the yarn jar because otherwise the workers would throw an exception about the ExecutorClassLoader not found. How did you modify the classpath to pick up the YarnClientImpl on the client side? If I run spark-shell it it gets the error: The paths to the example jar and yarn shaded jar in your instructions are incorrect (although the normal yarn example one is too), its in the examples/target/spark-examples-0.8.0-SNAPSHOT.jar. I'm not that familiar with the spark-shell yet, do you know how much load this puts on the client? thinking about it more its probably no more load then say the Hadoop apache Hive shell puts on the client. I tried this on both a non-secure and secure yarn cluster and both worked! thanks. |
@tgravescs glad to know that it works for you ;) you are right, when running spark-shell, the repl fat jar should be used. I will modify the doc. about the YarnClientImpl path, I don't do anything to make it work. I think this should already been include in the fat jar. As for me, as long as I export YARN_CONF_DIR, everything is fine. About the example and yarn shaded jar, it do have the scala-version prefix on my side, I am not sure whether it is because we don't have the same build env or? I use sbt/sbt assembly to prepare the package. I don't try mvn build. the shell by itself shouldn't put much load on the client I think. While other client program might be, say they do have heavy application logic or involves huge small tasks that need to talk to the executors a lot. Thanks for the comments ;) |
Cool - when you are done with this, can you help us make Shark work on YARN too? :) |
Ah, I'm using mvn that must produce different output directories then sbt. I'm not sure how you get YarnClientImpl unless sbt packages differently because it is not in the spark-repl-bin-0.8.0-SNAPSHOT-shaded.jar. It is in the spark-yarn-0.8.0-SNAPSHOT-shaded.jar. Are you setting the classpath any other way? Would you mind setting SPARK_PRINT_LAUNCH_COMMAND=1 and see what your classpath looks like? Seems like the computer_classpath script should include yarn or we should have a shaded jar that includes both now. |
sorry ignore my comment it does look like the sbt is adding the YarnClientImpl into the repl assembly jar. The mvn package command does not seem to do that. I'll file separate issue for that. |
Guys, regarding the output JARs, take a look at it after this patch: #857. It will make the JAR file the same for sbt and Maven. Matei On Aug 29, 2013, at 3:17 PM, tgravescs [email protected] wrote:
|
Also, I saw there's a TODO about lost executor IDs. How important is that? |
@mateiz oh, I think the detection of lost executor should already been done by Driver actor through akka remote events. I just wondering whether there are other approaching by Yarn framework that can be utilized to enhance or complement the error/fail detection. |
Will this be included in 0.8? |
With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.
Hi, I have the patch updated to the latest code , say sync with the new assembly, scripts and package path etc. doc on yarn also updated, Please take a review. |
Hey @colorant, I'm curious, can you submit this against the new Apache Spark repo (https://github.com/apache/incubator-spark)? It would be a great feature to include in the next 0.8.x release. |
@mateiz ok, sure, I will try to sync the code to current trunk and submit a pull request there. |
Minor cleanup following mesos#841. Author: Reynold Xin <[email protected]> Closes mesos#868 from rxin/schema-count and squashes the following commits: 5442651 [Reynold Xin] SPARK-1822: Some minor cleanup work on SchemaRDD.count()
In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote.
This approaching won't support application that involve local interaction and need to be run on where it is launched.
So In this pull request I have a YarnClientClusterScheduler and backend added.
With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well.
This enables spark-shell to run upon YARN.
This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on.
Docs also updated to show how to use this yarn-client mode.